Which, how, and what? Using digital tools to train surgical skills; a systematic review and meta-analysis

Background Digital tools like digital box trainers and VR seem promising in delivering safe and tailored practice opportunities outside of the surgical clinic, yet understanding their efficacy and limitations is essential. This study investigated Which digital tools are available to train surgical skills, How these tools are used, How effective they are, and What skills they are intended to teach. Methods Medline, Embase, and Cochrane libraries were systematically reviewed for randomized trials, evaluating digital skill-training tools based on objective outcomes (skills scores and completion time) in surgical residents. Digital tools effectiveness were compared against controls, wet/dry lab training, and other digital tools. Tool and training factors subgroups were analysed, and studies were assessed on their primary outcomes: technical and/or non-technical. Results The 33 included studies involved 927 residents and six digital tools; digital box trainers, (immersive) virtual reality (VR) trainers, robot surgery trainers, coaching and feedback, and serious games. Digital tools outperformed controls in skill scores (SMD 1.66 [1.06, 2.25], P < 0.00001, I2 = 83 %) and completion time (SMD -1.05 [−1.72, −0.38], P = 0.0001, I2 = 71 %). There were no significant differences between digital tools and lab training, between tools, or in other subgroups. Only two studies focussed on non-technical skills. Conclusion While the efficacy of digital tools in enhancing technical surgical skills is evident - especially for VR-trainers -, there is a lack of evidence regarding non-technical skills, and need to improve methodological robustness of research on new (digital) tools before they are implemented in curricula. Key message This study provides critical insight into the increasing presence of digital tools in surgical training, demonstrating their usefulness while identifying current challenges, especially regarding methodological robustness and inattention to non-technical skills.


Introduction
Surgical residents need sufficient clinical training experiences to develop their skills, achieve proficiency, and ultimately become competent surgeons.While clinical training is critical to achieve these goals, it is affected by available case-load, exposure, and most importantly, patient safety [1,2].As a result, residents also need training outside of the daily clinical practice and operating rooms (OR) which can be tailored to their educational needs, and provide them with the opportunity to practice and learn from mistakes without endangering Abbreviations: ASSET, Arthroscopic Surgical Skill Evaluation Tool; GOALS, Global Operative Assessment of Laparoscopic Skills; MIS, Minimally invasive Surgery; NA, not available; NOTSS, Non-Technical Skills for Surgeons; NS, not specified; OSATS, Objective Structured Assessment of Technical Skills; PGY, Postgraduate Year; POV, Point of View; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses; RCT, Randomized clinical trials; RoB 2 tool, Revised Cochrane risk of bias tool for randomized trials; ST, Specialty Trainee; VR, Virtual Reality.patients [3,4].
Digital tools, such as virtual reality (VR), digital box trainers, and applications for mobile platforms (apps), can provide these training opportunities, and are increasingly used by surgical educatorsespecially since the COVID-19 pandemic [5][6][7][8][9].There are myriad studies that introduce or validate a digital tool, and several reviews which evaluate these tools based on the technology used [10][11][12][13][14].However, before these tools are implemented in surgical curricula and relied on to improve training, an overview of available tools, their merits, and the skills they aim to train is essentialand currently missing.
Technical skills are an important aspect of surgical training, well incorporated in surgical curricula, and widely discussed in literature.Conversely, although non-technical skills have been shown to negatively affect performance and surgical outcome, they are often regarded as being more difficult to identify and teach [15][16][17][18][19]. Therefore, this systematic review and meta-analysis aims to answer the following three questions: Which digital tools are available to train surgical skills and what is their efficacy, How are these tools used, and What skills (technical and/or non-technical) do these tools aim to train?

Material and methods
This systematic review and meta-analysis was performed in accordance with the Cochrane Handbook for Systematic Reviews of Interventions version 6.0 and PRISMA-guidelines [20,21].
Literature search MEDLINE, EMBASE, and Cochrane databases were reviewed for studies assessing digital skill training tools for surgical residents, published since January 1st 2010 up until the last search update of December 7th, 2022.Keywords related to digital training, skills, and competencies were incorporated in the search, the full search can be found in the Supplementary Material.Included articles were crossreferenced for additional relevant studies.Digital training was defined according to the European Commission definition: the pedagogical use of digital technologies to support and enhance learning, teaching and assessment [22].Skills were defined according to Merriam-Webster dictionary: "a learned power of doing something competently: a developed aptitude or ability" [23].
Randomized clinical trials (RCTs) were included in this review to attain the highest level of evidence and to enable comparison of digital tools.RCTs were eligible if they were published in Dutch or English, assessed digital training tools aimed at skill acquisition, and used objective performance indicators such as computed metrics or scoring tools.Studies which used subjective outcomes, such as participant questionnaires or self-evaluation tools, were excluded.Additionally, studies reporting on conference proceedings, study protocols, and studies which evaluated multiple digital tools without assessing each source separately were excluded.Two authors (TM and SvdS) assessed all titles and abstracts and included studies for full-text appraisal when both reviewers agreed on inclusion.Disagreements were resolved by consulting a third reviewer (MPS).A standardized form was used to systematically extract data from the studies including; trained skills, study design, characteristics of participants and digital tools, addressed skills, outcomes, and factors affecting the efficacy of the training tool.

Tool availability, efficacy, and use
Studies were categorized according to the digital tool they examined.Overall efficacy was evaluated through meta-analyses of post-test outcomes on skill scores (checklist scores and computed metrics) and time (task completion time).Based on these data, digital tools were compared with a control group (receiving traditional and/or no additional training), and with training in a wet or dry lab.Within these comparisons, subgroups were created based on the studied digital tool to evaluate the efficacy of individual tools and the heterogeneity therein.If sufficient studies were available, digital tools were compared to other digital tools.To examine how the utilization factors of digital tools affected outcomes, study data were pooled according to their training structure (self-directed versus prescribed training or training to proficiency) and training duration (minutes-days versus weeks-months).
Meta-analyses on pooled data were performed using Cochrane's Review Manager (RevMan) 5.4 [24].All extracted data were converted to standardized mean differences (Hedges g effect size).When mean and standard deviation(SD) were not available, reported outcomes (p-values, median, range, P-value, and 95 % Confidence Interval (CI)) were used to estimate the effect size.If none of these data were provided, a study was excluded from the meta-analysis.A random-effects model was used in al analyses due to expected methodological (arising from the broad literature search) and statistical heterogeneity, which was quantified by calculating the I 2 statistic.Effect sizes were presented with 95 % CI's and deemed significant if P < 0.05.Because this review presents the minimally available evidence, outcomes of meta-analyses were reported even in the light of high heterogeneity [21].

Skills trained using digital tools
Studies were evaluated based on the skills they primarily aim to train: technical skills, general non-technical competencies (according to the CanMEDS framework), and non-technical surgical skills (according to the NOTSS taxonomy) [25,26].The CanMEDS framework identifies seven competencies (roles) each physician should master, based on the needs of the people they serve.The Medical Expert is identified as the role in which the six intrinsic roles are integrated: the Communicator, Collaborator, Leader, Health Advocate, Scholar, and Professional roles.The framework provides key-and enabling competencies, which were used to assess reported outcome measures in this review.The NOTSS taxonomy is aimed specifically at non-technical skills in the OR.The taxonomy defines four skill categories (situation awareness, decision making, communication & teamwork, and leadership), which are all subdivided in three elements.The NOTSS system handbook described these categories and elements in-depth, and was used to assess the primary outcome measures in this review [27].A graphical overview of the CanMEDS framework roles and NOTSS taxonomy categories can be found in Table 1.TMF and SvdS evaluated which skills were trained in the study, and whether this skill was included as the primary outcome of the study or assessed in any way by the authors.

Methodological quality and bias
The methodological quality of the included studies was assessed using the revised Cochrane risk of bias tool for randomized trials (RoB 2 tool), which determines an overall risk of bias of randomized trials based on five bias domains; selection of reported result, measurement of outcome, missing outcome data, deviations from intended interventions, and randomization process [28].

Results
Eighteen hundred and fifty-one studies were screened based on title and abstract.A total of 178 full-texts were reviewed, resulting in the inclusion of 33 studies comprising 927 residents ..Fig. 1 depicts the PRISMA flow diagram of included studies and Table 2 summarizes the study characteristics and describes demographics, study setting, and intervention protocols.

Study characteristics and available tools
The 33 included studies addressed six digital tools;    … that has a challenging goal, is fun to play and engaging, incorporates some kind of scoring mechanism, and supplies the user with skills, knowledge or attitudes useful in reality" [62].

Digital tools compared to wet lab and dry lab training
Of all studies, four (12.5 %) studies compared a digital tool with training in a wet and/or dry lab; three compared a VR trainer with wet/ dry lab training [34,41,44].Valdis et al. compared a robot trainer with training in both a wet lab and a dry lab [55]..As depicted in Fig. 3, digital tools were equally effective with regard to skill scores (SMD -0.11 [− 0.45, 0.24], P = 0.55, I 2 = 10 %).Insufficient data was available to perform a comparison on skill completion time.

Subgroup analyses
Results of subgroup analyses are presented in Table 3, individual Forest-plots can be found in supplemental Figs.2-7.

Training factors subgroups
Differences in training structure and training duration do not explain the heterogeneity in outcomes of digital tools versus a control group.Studies using a prescribed training structure (i.e.training for a defined amount of time or training to proficiency), achieved slightly higher final scores an needed slightly less timebut differences with a studies using a self-directed approach to using the digital tool were not significant (skill subgroup differences: P = 0.11, I2 = 61 %), time subgroup differences: P = 0.36, I2 = 0 %).The same differences were observed for pooled results based on training duration (hours to days versus weeks to months); while there were small differences between subgroup outcomes these differences were not significant (skill subgroup differences: P = 0.06, I2 = 70.7 %), time subgroup differences: P = 0.10, I2 = 64.1 %).

Assessed skills
Only Graafland et al. and Lohre et al. used non-technical skills in their primary outcomes; situation awareness and decision making, both within the NOTSS framework (Fig. 5) [40,51].Components of the 'Medical Expert' and 'Scholar' CanMEDS roles overlapped with technical skills trained and measured by all other studies.Fifteen (54.5 %) studies used skills checklists, such as the OSATS (Objective Structured Assessment of Technical Skills), ASSET (Arthroscopic Surgical Skill Evaluation Tool), and GOALS (Global Operative Assessment of Laparoscopic Skills).These checklists include non-technical skills such as "use of assistants" and "flow of operation and forward planning" -which were assigned to the "collaborator" role within CanMEDS, and "situation awareness", "communication and teamwork", and "decision making" components within NOTSS.However, none of these studies reported on the non-technical skills item in their outcomes [30,31,34,37,41,42,46,47,50,52,53,56,58,60,61].The NOTSS component 'Leadership' and the CanMEDS roles 'Leader', 'Communicator', 'Health Advocate' and 'Professional' were not reported or measured by any study.

Methodological quality of included studies
There were only two studies with an overall low risk of bias (Supplemental fig.8) [39,57].All other studies had at least some concerns as they suffer from the lack of a pre-specified study protocol (n = 30), insufficient specification of the randomization process and/or insufficiently blinded outcome assessors (n = 27).

Discussion
Research, development, and implementation of digital training tools for surgical residents has increased substantially in recent years, and has gained much attention during the COVID-19 pandemic.This systematic review and meta-analysis reveals that digital tools are widely and readily available, that most evidence is available for VR trainers, and that very few studies address non-technical skills.Most digital tools had positive effects on skill scores and performance time when compared to a control group, and significant effects of training factors were not observed in this study.While this study presents the best available evidence, caution is needed in interpreting these results due to high associated (>70 %) heterogeneity.
In this light, there are two results which can be interpreted with more certainty; VR trainers were equally effective as using a box trainer and as training in a wet or dry lab in this review.While the first outcome is accordance with earlier systematic reviews, no precedent of the latter is available in current literature [13].Based on these results, box trainers, VR trainers and wet/dry labs are all valid training methods, yet there are differences to consider; wet/dry labs perform better with regard to training efficiency (the speed in which new skills are acquired), but do not have the advantage of training in your own time that the two digital tools have [34,55].Box trainers are widely available in different configurations and from different manufacturers, are probably the least costly training tool of the three, yet are often primarily aimed at novices [52,63].When the aim is to support residents in working more autonomously, clinically relevant training tools (such as wet/dry labs or VR trainers) may be necessary before the skills can be transferred to the OR [64].VR-training does not have these disadvantages, but can be expensive and time consuming to develop [52].Therefore, it is worth it to consider if there appropriate VR-systems are available, before deciding to develop a new system for a training objective.
Most studies in this review compared a digital tool with a controlgroup (receiving no additional training).While comparing an intervention with a placebo is a common and useful methodology in studies that evaluate medical interventions, this approach introduces several problems when it is used in educational research.Many digital tools had to be used in a structured way, dedicated time was provided, and the  effects of their use was evaluated, while the control group received no additional training and none of this attention.While we believe embedding digital tools is of the utmost important to optimize their use, this difference in the provision of the intervention in this approach is problematic for the validity of the results.In essence, what all of these studies prove is that if resident training is monitored, skills will likely improve.Due to inherently introduced attention bias, it is unclear whether this effect originates from the digital tool itself or from the imposed training.A remarkable example of this is the study of Adams et al., who observed that it is more effective for technical skills acquisition to train on a gaming console than on a box trainer, provided that more hours are trained [65].In subgroup analysis we therefore aimed to evaluate the effects of training structure and duration.While we found suggestions of differences in training effects of these factors, the effects were not significant and associated heterogeneity was high.While this makes it challenging to interpret outcomes, it clearly reveals the need to improve the quality of research on digital tools."Proving" that a digital tool works in a study with these biases and unclarity should not be enough support to implement and adopt the tool in surgical curriculalet alone to use it as a way to improve training and its' efficiency.

a: Effects of digital tools versus controls on skill outcomes b: Effects of digital tools versus controls on Ɵme outcomes
We therefore highly advocate improving the robustness of studies on digital tools.A start would be to adhere to reporting guidelines (most studies suffered from overall risk of bias due to the lack of a protocol and information on randomization), and diminishing the effects of attention bias by providing equal training schedules to all interventions.Exemplary are the immersive VR studies which all compared the intervention with the reading of textbooks and journals [42,50,51,61].When comparing these studies with the study by Orzech et al. [52] who compared a box trainer with a VR trainer and with training in the OR, including a cost-analysisthe external validity and meaningfulness of the results of the latter are evident.
In recent years, it has become clear that a surgeon lacking nontechnical skills, affects not only the performance in surgical teams, but may lead to avoidable incidents, and thus impact postoperative outcome [15][16][17][18]66].However, there is little focus on teaching and evaluating non-technical skills [67].No digital tools could be identified in this review, yet it seems improbable that these non-technical skills are not trained at all.Attitudes and non-technical skills are more likely to be trained on the job itself, or using non-digital simulation [68,69].Yet there is no reason other than the blind-spot of the developer or educator not to develop tools to support both technical and non-technical skill, or not to evaluate the effect of digital resources on non-technical skills with the same objective methodology as their technical equivalent [67,70,71].Promising technologies to this regard are VR, AR (augmented reality), MR (mixed reality) and telementoring solutions; as well as use of the Metaverse and medical data recorders in the OR.VR, AR and MR training have shown to increase both knowledge and motivation, and to provide insight in work ethics, personality, and communication skills of various trainees in medicine [72,73].Additionally, telementoring can support both mentee and mentor, and reduce the strain of giving written feedback.Use of data output coming from a medical data recorder may help to qualify and, upon analysis, improve non-technical skills performance of surgical teams.It is known that using such a system benefits surgical teams and influences human factors that relate positively to performance of surgery [17,74].
Increasing scientific data related to the question if, and how, digital tools can help enhance the skills and traits as described in the intrinsic CanMEDS roles and NOTSS would be a first step [67].Upcoming innovative educational tools such as virtual rounds, video-based learning, livestreamed surgical cases, Artificial Intelligence-based analysis of surgical performances, and many others tools and resources may prove invaluable in surgical resident training in the future [75][76][77][78].It is therefore up to surgical educators and residents to stay on top of these innovations and identify training requirements, thereby targeting specific didactic needs and providing a tailored education.
A possible approach to support digital training of non-technical skills in surgery is to follow the introduction template of the OSATS-checklist back in 1996 [79,80].Projection of the OSATS approach onto the CanMEDS roles and NOTSS skills requires explication into standardized, measurable non-technical skill indicatorsspecific to surgical practice.There are several limitations to this study.Included studies suffered from variation in study methodology, overall risk of bias and heterogeneity, and most studies suffered from confounding of novelty, availability, attention, and/or complianceto name a few.While the metaanalyses are therefore of suboptimal value, we chose to perform them nevertheless to provide the best available evidence and reveal its limitations.Including studies which report on subjective outcomes may have resulted in identifying and including more studies focussing on nontechnical skills.However, resources are consistently evaluated objectively on their technical outcomes in controlled studies.For them to be truly advantageous they need to be able to improve real life skillsincluding non-technical skills.We therefore believe that their effect on non-technical skills needs to be evaluated in the with the same methodological setup.Lastly, very little information is available on the effects of PGY on outcomes, only one study differentiated between different PGY's.They found inconsistent results, and their study was not powered on this outcome [41].T.M. Feenstra et al.

1 .
Digital box trainers (n = 4, 12.1 %): Training box with a camera, instruments and training exercises, enhanced by digital computations of performance metrics.

Fig. 2 .
Fig. 2. a: Effects of digital tools versus controls on skill outcomes b: Effects of digital tools versus controls on time outcomes.

Fig. 3 .
Fig. 3. Effects of digital tools versus wet and dry lab on skill scores.
An initial step would be to implement non-technical scoring systems digitally into surgical curricula.The result of combining this nontechnical skills checklist with technical skills assessments such as the OSATS, will be a more comprehensive overall surgical skills assessment of the resident.Currently, new systems are being developed to digitally advance education and evaluation of technical and non-technical skills, a: Effects of VR trainers versus box trainers on skill scores b: Effects of VR trainers versus box trainers on skill compleƟon Ɵme

Fig. 4 .
Fig. 4. a: Effects of VR trainers versus box trainers on skill scores.b: Effects of VR trainers versus box trainers on skill completion time.

Table 1
Definition of the seven CanMEDS roles and four NOTSS competencies.

Table 2
Characteristics of included studies.

Table 2
(continued ) a When non-technical skills are presented between brackets, they were assessed by the authors but outcomes specific for that non-technical skill are not presented in the manuscript.T.M. Feenstra et al.

Table 3
Meta-analysis of subgroup analyses on skill scores and performance time.